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ABSTRACT 

In recent years, both homing endonucleases 
(HEases) and zinc-finger nucleases (ZFNs) have 
been engineered and selected for the targeting of 
desired human loci for gene therapy. However, 
enzyme engineering is lengthy and expensive and 
the off-target effect of the manufactured endonucle- 
ases is difficult to predict. Moreover, enzymes 
selected to cleave a human DNA locus may not 
cleave the homologous locus in the genome of 
animal models because of sequence divergence, 
thus hampering attempts to assess the in vivo 
efficacy and safety of any engineered enzyme prior 
to its application in human trials. Here, we show that 
naturally occurring HEases can be found, that 
cleave desirable human targets. Some of these 
enzymes are also shown to cleave the homologous 
sequence in the genome of animal models. In 
addition, the distribution of off-target effects may 
be more predictable for native HEases. Based on 
our experimental observations, we present the 
HomeBase algorithm, database and web server 
that allow a high-throughput computational search 
and assignment of HEases for the targeting of 
specific loci in the human and other genomes. 
We validate experimentally the predicted target spe- 
cificity of candidate fungal, bacterial and archaeal 
HEases using cell free, yeast and archaeal assays. 

INTRODUCTION 

Gene targeting, the site-specific manipulation of the 
genome, is the holy grail of gene therapy and genetic 



engineering. It promises to markedly reduce the risks 
associated with viral vector-mediated gene insertion, 
most notably, the risks of induced oncogene 
overexpression and insertional mutagenesis (1). Gene 
manipulation at a locus of choice is best facilitated by 
the introduction of a site-specific double-stranded DNA 
break (DSB). The default repair of the DSB by non- 
homologous end joining (NHEJ) can lead to gene disrup- 
tion. In the presence of an appropriate donor template, the 
break can be repaired by homologous recombination 
(HR) leading to gene correction or gene insertion at the 
desired locus. Indeed, in recent years, much effort has 
been invested in the engineering of site-specific DNA 
endonucleases that can cleave desired loci in the human 
genome and induce gene targeting. Impressive results 
came from the use of zinc-finger nucleases (ZFNs), 
chimeric enzymes consisting of an endonuclease domain 
that is artificially linked to a site-specific array of zinc- 
finger domains (2). Of special note is the use of ZFNs 
engineered for the specific ex vivo disruption of the HIV 
coreceptor CCR5 in the T lymphocytes of AIDS patients, 
now under clinical trials (3) (see: http://clinicaltrials.gov/ 
ct2/show/NCT00842634). Another promising option is 
presented by meganucleases, engineered homing endo- 
nucleases, selected to cleave a locus of choice [e.g. XPC 
(4), RAG (5)]. ZFNs and meganucleases have also been 
used in crop bio-engineering (6), in the production of 
model cell lines (7,8) animal models (9), induced pluripo- 
tent stem cells (10,11) and more. 

The obvious advantage of using engineered endo- 
nucleases is the ability to target almost any gene of 
choice. However, the production of site-specific ZFNs 
and meganucleases is burdensome, lengthy and expensive. 
Moreover, the rate and distribution of off-target cleavage 
for these enzymes is difficult to predict (12). Importantly, 
safety assessments for engineered endonucleases are 
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hindered by the fact that an enzyme selected to cleave a 
human locus will seldom cleave the homologous gene of 
an animal model because of sequence divergence. Here we 
show that native homing endonucleases (HEases), with 
their predictable target range and conservation of target 
sites in animal models, may present a promising alterna- 
tive to engineered endonucleases for gene targeting. 

HEases are a large and diverse class of site-specific 
nucleases found in Archaea, Bacteria and Eukarya and 
in their respective viruses (13,14). HEase genes (HEGs) 
are selfish genetic elements that reside as open reading 
frames (ORF) within self-splicing introns or as an endo- 
nuclease domain within inteins (15). An HEase promotes 
the horizontal propagation of its respective intron/intein 
into an intron-less or intein-less allele by cleaving the 
vacant allele to induce HR or reverse transcription, 
which results in effectively copying the intron/intein 
together with the HEG into the same position in the 
vacant allele (Figure 1). Importantly for their use in 
gene therapy, HEases possess the ability to introduce 
highly specific breaks in the human genome due to their 
long target sequences (14-40 bp). Indeed, native HEases 
are able to induce either site-specific NHEJ or site-specific 
HR in mammalian genomes engineered so as to contain 
the HEase's target site (16-19). However, native HEases 
have not been used for gene therapy to date, probably 
because of the common misperception that they have no 
targets in the human genome. Instead, as mentioned 
above, selected HEases were subjected to rational engin- 
eering and directed evolution, so that they could target 
disease-associated genes (4,20). Nevertheless, basic 
research into native HEases has revealed some features 
with implications for their potential use in gene therapy. 




Figure 1. The Homing process. The homing endonuclease (HEase) is 
expressed from the HEG (red), residing in an intron or as an in-frame 
domain of an intein (purple) in a hosting gene (cyan). It cleaves the 
target site (orange) in a vacant homolog of the hosting gene to induce 
homologous recombination (gene conversion or double crossover), 
turning the vacant homolog into a HEG-carrying one. 



Importantly, the target sites of HEases are not strin- 
gently defined: some nucleotides along the target site can 
be substituted while cleavage efficiency is retained (21). 
Without a general way to predict the plasticity in HEase 
target recognition, it might have been very difficult to 
apply HEases as tools for gene therapy. However, 
accumulating evidence suggests that HEase plasticity is, 
at least to some extent, predictable based on the 
evolutionary considerations (22-26). It has long been 
appreciated that inteins and self-splicing introns tend to 
be found in conserved motifs of essential genes (27). 
Hence, any partial or inaccurate intron/intein deletion or 
mutation at a splicing motif is detrimental to the host, 
which is left with a persistent disruption in a critical 
gene. The self-serving localization of the intron/intein is 
also beneficial for the HEase. An HEase promotes the 
copying of its respective intron/intein into its target site 
and therefore, HEase targets coincide with intron/intein 
insertion sites (Figure 1). In particular, a conserved inser- 
tion site is also a conserved HEase target site. Therefore, 
the host cannot easily evade the parasite by altering the 
target sequence. 

Nevertheless, even highly conserved loci usually include 
some variable sites. HEases have therefore evolved so that 
they rely on conserved positions within the target sequence 
for robust target recognition (24-26). In particular, when 
the hosting gene encodes a protein, selection on this gene 
acts mainly to conserve its amino acid translation rather 
than the coding nucleotide sequence. Synonymous substi- 
tutions are therefore frequent even in sequences coding 
conserved protein motifs. Thus, HEases are expected to 
evolve tolerance to silent target mutations. Indeed, 
Kurokawa et al. (22) have demonstrated for an array of 
intron-encoded HEases residing in protein-coding genes, 
that single silent mutations in the target sites are far better 
tolerated than mutations that alter the coded polypeptide. 
Scalley-Kim et al. (23) have examined the specificity 
profile of the I-Anil HEase and found it to be strongly 
correlated with wobble versus non-wobble positions and 
also with the degree of degeneracy inherent in individual 
codons. At the focus of the above studies were HEases of 
the LAGLIDADG family, which is the most abundant 
structural family of HEases. However, similar evolution- 
ary considerations may apply to the binding and cleavage 
specificity of GIY-YIG HEases (28,29) which have a 
highly distinct mode of DNA interaction. 

In this work, we reasoned that the predictability of 
HEase target recognition may allow the discovery of 
formerly unidentified HEase targets in the human and 
other genomes, for the benefit of gene therapy and 
genetic engineering. We first demonstrate that HEases 
residing in protein-coding genes are often tolerant of con- 
comitant synonymous substitutions at all wobble pos- 
itions in their target site. This is a generalization of 
previous reports (22-26) that allows finding HEase 
targets in formerly unidentified loci in the human and 
other genomes. To apply this principle to as many 
HEases as possible, we searched all public sequence data- 
bases for novel HEGs. At this stage, we relied on another 
property of naturally occurring HEGs, which is that the 
gene coding the enzyme and the target sequence of 
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the enzyme are found on the same locus. In particular, the 
target sequence flanks the intron/intein insertion site 
(Figure 1). This property allowed us to predict a 
putative target sequence for each newly discovered 
HEase. The results of this search were compiled in the 
form of the HomeBase database. Finally, we experimen- 
tally validated the predicted specificity range of candidate 
fungal, bacterial and archaeal HEases using cell-free, yeast 
and archaeal assays. Thereby, a large arsenal of naturally 
occurring HEases was compiled together with their pre- 
dicted target ranges, providing a diverse toolbox of 
specific cutters for the genetic manipulation of large and 
complex genomes. 

MATERIALS AND METHODS 

Experimental methods 

Strains, plasmids and oligonucleotides. Supplementary 
Table SI lists the strains, plasmids, oligonucleotides and 
PCR primers used in this study. For extra-cellular 
cleavage assays we inserted the native, mutated or 
predicted target-sites of Pl-Scel and PI-PspI into 
pGEM-Teasy (Promega, Figure 2a and b) or into the 
Pfol site of pDELT (Figure 2d). pDELT was constructed 
by cleaving pRS304 (30) with Smal and inserting a 427 bp 
long PCR segment of the Saccharomyces cerevisiae LYS2 
gene, amplified using the OI3 and B82 primers. 

The archaeal homing assay was conducted using the 
Haloferax volcanii strain WR532 (HI 33) ApyrE2 (31). 
The Archaea were transformed with either the pTA131 
(32) or the pTAl.l plasmids, or the pTAl.lhum plasmid. 
pTAl.l and pTAl.lhum are derivatives of pTA131. 
pTAl.l carries between the EcoRI and Spel sites a 1.1 kb 
long PCR segment of H. volcanii POLB gene lacking the 
POLB intein (33) that was amplified using the Hvol 1.1 F 
and R primers. The pTAl.lhum carries a similar segment 
in which a BmgBI fragment including the native target of 
the POLB HEase was replaced by a fragment carrying the 
predicted human target. 

The S. cerevisiae strains used for the yeast HEase assay 
are all derivatives of OI50 (Supplementary Table SI). 
In the following explanation about the construction of 
the derivatives, X stands for either one of: (i) Botrytis 
cinerea PRP8 HEase native target; (ii) The B. cinerea 
PRP8 HEase predicted human target; (iii) The B. cinerea 
PRP8 HEase predicted mouse target; (iv) Nostoc RNR 
HEase native target; (v) Nostoc RNR HEase predicted 
N. punctiforme target; (vi) Nostoc RNR HEase predicted 
Synechococcus target. Yeast strains with YDEUH prefix 
(YDEUH + X target) were constructed by transform- 
ing OI50 with Ncol-cleaved pDEUH derivatives 
(pDEUH + X target). pDEUH is pRS303 (30) carrying a 
315-bp long PCR segment of the S. cerevisiae URA3 gene 
(amplified using the OI5 and B45 PCR primers) at its 
Hindi site. The pDEUH derivatives each have a different 
HEase target at the Pfol site of pDEUH. Yeast strains 
with YDELT prefix (YDELT + X target) were constructed 
by transforming OI50 with Hpal-cleaved pDELT deriva- 
tives (pDELT + X target). The pDELT derivatives each 
have a different HEase target in the Pfol site of 



pDEUH. As explained above, pDELT is pRS304 (30) 
carrying a 427-bp long PCR segment of the S. cerevisiae 
LYS2 gene. 

Yeast strains with YDEUHLT prefix (YDEUHLT + X 
targets) were constructed by transforming YDEUH + X 
target with Hpal-cut pDELT + X target. Final constructs 
of the form: YDEUHLT + X target + pGML + Y HEase, 
were constructed by transforming YDEUHLT + X targets 
with a pGMLlO (34) derivative encoding the Y HEase 
(either B. cinerea PRP8 HEase or Nostoc RNR HEase). 
The B. cinerea PRP8 HEase was amplified from the 
B. cinerea strain B05.10, a kind gift from Professor 
Annika Bokor (35), using the primers: B. cinerea-HEN- 
F and R. The forward primer includes an ATG start 
codon and an SV40 nuclear localization signal (NLS). 
The amplified B. cinerea PRP8 HEase was inserted 
between the Xbal and Xmal sites of pGMLlO. The 
Nostoc RNR HEase was amplified from Nostoc 
(Anabaena) sp. PCC 7120, a kind gift from Professor 
Sammy Boussiba (36), using the primers: Nostoc-HEN-F 
and R. Here again, the forward primer includes an ATG 
start codon and an SV40 NLS. The amplified Nostoc 
RNR HEase was inserted between the BamHI and 
EcoRI sites of pGMLlO. 

Extra-cellular cleavage assays. A quantity of 1 ug of a 
plasmid [pGEM derivative (Figure 2a and b) or pDELT 
derivative (Figure 2d)] carrying a native, mutated or 
predicted target of Pl-Scel (Figure 2a and d) or PI-PspI 
(Figure 2b) were subjected to cleavage using 1 U of each 
enzyme as provided by New Englands Biolabs (at 
29pmol/U for Pl-Scel and 80fmol/U for PI-PspI), in 
a 50 pi reaction at 37°C (Pl-Scel) or 65°C (PI-PspI). 
Aliquots were extracted every 30min (Figure 2a), 15min 
(Figure 2b) or after a 16 h over-night incubation (Figure 
2d). Pl-Scel was heat-inactivated (20min, 65°C) and all 
samples were fragmented by a restriction endonuclease 
[BspHI (Figure 2a and b) or Xbal (Figure 2d) for 1 h in 
a lOjil reaction] prior to gel elecrophoresis. 

Archaeal homing assay. For the transformation of 
H. volcanii a liquid culture (1.5 ml; OD 60 o n m of 1.5) was 
centrifuged at 3500 g for 5min. The supernatant was dis- 
carded, the cells were resuspended in 200 jil spheroplasting 
solution (1M NaCl, 27 mM KC1, 50 mM Tris-HCl pH 
8.2, 15% sucrose) and incubated at room temperature 
for 5min. A quantity of 20 pi of 0.5 M EDTA was 
added and cells were incubated at room temperature for 
lOmin. Then, lOul of purified plasmid DNA was mixed 
with 15 ul spheroplasting solution and 5 ul of 0.5 M EDTA 
was added to the cells, followed by incubation of 5 min at 
room temperature. Subsequently, 240 ul of PEG solution 
(60% PEG 600 in spheroplasting solution) was added and 
cells were incubated for an additional 20 min at room tem- 
perature. Following the incubation, 1 ml of regeneration 
solution (3.4 M NaCl, 175mM MgS04, 34 mM KC1, 
5mM CaC12, 50 mM Tris-HCl pH 7.2, 15% sucrose) 
was added and cells were centrifuged at 3500g for 7 min. 
The supernatant was discarded and cells were resuspended 
in HY medium supplemented with 15% sucrose and left to 
incubate without shaking overnight at 37°C. The cultures 
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were then washed and plated on selective media. 
The presence of an intein on the exogenic plasmids indi- 
cates that homing has taken place and in particular that 
efficient cleavage has occurred. Intein presence was tested 
by PCR using the RP2 and M13F primers. The standard 
errors were calculated based on a binomial distribution 
with an added pseudo-count of 0.5 successes and 0.5 
failures. 

Yeast HEase assay. Prior to the recombination assay, 
dilutions of four independent colonies of each strain 
were plated on YEPD medium (1% yeast extract, 2% 
Bacto Peptone, 2% dextrose, 2% Bacto-agar) and on 
YEP-GAL medium (1% yeast extract, 2% Bacto 
Peptone, 2% galactose, 2% Bacto-agar) in order to 
assess the toxicity of each enzyme (Supplementary 
Figure S3). For the recombination assay, each of the 
four colonies of each strain was grown overnight at 
30°C in a synthetic complete (SC) liquid medium supple- 
mented with 2% galactose (for strains without an HEase 
expression plasmid), or SC-Leucine liquid medium 
supplemented with 2% galactose (for strains with an 
HEase expression plasmid, marked with the LEU2 
gene). Cells were then pelleted, diluted and plated on 
YEPD and on SD-Ura (synthetic complete medium — 
uracil + 2% dextrose + 2% bacto-agar) and SD-Ura- Lys 
(synthetic complete medium — uracil — lysine + 2% 
dextrose + 2% bacto-agar) to assess recombination rate 
(implying HEase activity rate). HEase activity rate is 
defined as the average fold increase in colony formation 
on the selective medium of the strain with target X and 
HEase Y with respect to the average colony formation on 
the selective medium of the strains without any HEase. 
The confidence intervals for the fold increase were 
calculated by Monte Carlo sampling of pairs of simulated 
measurements from two normal distributions having the 
same mean and variance as the actual measurements. 
The 95% confidence intervals used are the 2.5th and 
97.5th quantiles of the emerging distribution of ratios. 

Construction of the HomeBase HEase database 

Search for HEGs in DNA databases (BLAST-1). A 
homology search for novel HEGs was conducted across 
all available DNA data sets. A set of known HEGs was 
used as queries for BLAST searches. Protein sequences of 
known HEases in introns and inteins were retrieved from 
manually curated databases. A total of 289 sequences of 
HEGs in Group I introns were downloaded from the 
Group I Intron Sequence and Structure Database (37) 
(GISSD; http://www.rna.whu.edu.cn/gissd). Three 
hundred twenty five sequences of HEGs in inteins were 
downloaded from INBASE (38) (http://www.neb.com/ 
neb/inteins.html). In both introns and inteins, the 
manual curation of these databases ensures that these 
protein sequences do not include exonic or exteinic 
parts. This is essential for our purpose because otherwise 
the BLAST searches will retrieve many homologs of the 
hosting gene instead of novel HEGs. This sequence set, 
totaling 614 HEGs will be subsequently referred to as the 
'known HEGs set'. Using translated BLAST (tblastn), the 



known HEase protein sequences were used as queries to 
search against all possible six frames translations of the 
non-redundant nucleotide database nt and the 
non-redundant environmental (metagenomic) nucleotide 
database env_nt, both downloaded from the NCBI web 
site (http://www.ncbi.nlm.nih.gov/ftp/). We retained all 
hits of value < 10. This stage will be subsequently 
referred to as BLAST-1. 

For each hit sequence in the BLAST-1 results there are 
often several high scoring pairs (HSPs), pairwise align- 
ments of a query subsequence to a hit subsequence 
(which is translated in one of the six possible reading 
frames). Furthermore, a novel HEG sequence is usually 
homologous to several of the known HEGs, therefore 
several queries yield overlapping HSPs in the same hit 
locus. The number of HSPs per hit sequence is often 
large for whole chromosome sequences, which may 
contain several loci of HEase homology. To divide such 
hit sequences to distinct loci, all HSPs from all queries on 
the same hit sequence were clustered. Overlapping and 
neighboring HSPs <2000 bases apart, were clustered and 
1000 bases were added on either side of every cluster. 
These hit subsequences are the putative novel HEGs and 
their adjacent intron/intein and exon/extein sequences. 
This procedure often resulted in a cluster of several 
HEG-containing intron/inteins in the same host gene. 

Defining splice sites (HEase target sites) based on vacant 
homologs (BLAST-2) . For any gene that harbors a HEG 
inside an intron/intein, a vacant homolog is a homologous 
sequence of that gene without the intron/intein. A second 
BLAST search was conducted for each putative 
HEG-containing sequence in order to identify such a 
vacant homolog. Each subsequence resulting from the 
clustering of the BLAST-1 HSPs was used as the query 
for the second BLAST search, hereby termed BLAST-2. 
In this translated BLAST (tblastx) each of the three 
possible translations of the strand that was aligned to 
the known HEG in BLAST-1 was searched against the 
same non-redundant DNA databases as above. Each 
BLAST-2 hit was checked for several criteria to determine 
if it contained a bona fide vacant homolog (Supplementary 
List SI). After identifying the first intron/intein in the 
BLAST-2 query, we repeated the procedure using only 
BLAST-1 HSPs that do not overlap the intron/intein. 
Only BLAST-2 HSP pairs that contain these BLAST-1 
HSPs were considered as potentially defining introns/ 
inteins. Thereby, several additional non-overlapping 
introns/inteins were often identified in the same 
BLAST-2 query until no BLAST-1 HSPs remained. 

The final outputs of the algorithm are the target 
sequence flanking these splice sites in the query (the 
pHEG-containing sequence) and the ORF of the HEase 
(in introns) or of the intein, which codes for the HEase. 
This output constitutes the HomeBase database. 

To assess the accuracy of prediction, we sampled 30 
putative HEGs and manually inspected all stages of the 
automatic pipeline, including: confidence in the BLAST-1 
homology and conservation of known HEase motifs; con- 
fidence and accuracy of the BLAST-2 alignment to the 
vacant homology, especially of the exon/extein boundaries 
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after correction of gaps/overlaps (where possible we 
compared the prediction to annotation of intron 
position and intron/intein splice sites consensus). 
We found 92 ± 4% (mean ± SE) of the results to have 
reliable homology to a known HEG and 75 ± 7% to be 
true HEGs with correct identification of the target site. 
For the 222 results of the second iteration, we estimate 
an accuracy level of 25 ± 7%. The overall number of 
known HEGs was estimated by querying all protein 
records available in the Entrez engine. The search term 
used was 'homing endonuclease' [All Fields] NOT 
txid33208 [Organism:exp]. (33 208 is the taxonomy identi- 
fier for Metazoa). The number of predicted HEGs that 
were previously annotated as HEGs was determined by 
checking the GenBank records containing them. All 
CDS features overlapping the containing intron/intein 
were examined. In addition, if a CDS contained a 
'protein_id' tag, the corresponding GenPept record was 
obtained through Entrez and all features found were 
included in the search. For all the features selected, the 
text of the following tags was examined: 'note', 'product', 
'gene', 'gene_synonym' for CDS features and 'note', 
'region_name' and 'product' for features found in the 
protein. A HEG was considered to be previously 
annotated if any of the tags contained the term 'homing 
endonuclease'. 

The HomeBase database. The HomeBase database is the 
unified set of HEGs including our novel predictions and 
the known HEGs from INBASE and GISSD. To infer the 
target sequences from these databases, we download the 
exon or extein sequences from INBASE or GISSD 
respectively and extracted the first seven amino acids 
from each exon/extein. 

We classified HomeBase records into HEG families by 
homology (blastp) of the translated ORF of the putative 
HEG against a set of HEG sequences with family anno- 
tation collected from the INBASE, GISSD and Entrez 
Protein databases. This set includes representatives from 
the LAGLIDADG, HNH, GIY-YIG and His-Cys 
families. Each HomeBase HEG was assigned the family 
annotation of the top BLAST hit, except in ambiguous 
cases where the top hit from a different family had 
an lvalue that was close to the top hit lvalue by a 
factor <10. 

Identifying candidate HEases for specific targets. To reveal 
the potential utility of the HomeBase collection for genetic 
intervention in humans, we used translated BLAST 
(tblastn) to search for a match between the translations 
of HomeBase target sequences and all possible six-frame 
translations of the human genome. Resulting hits were 
sorted by their similarity. We also required that five out 
of the six central residues be identical and that the sixth 
residue be at least similar. We classified the hits as genie or 
intergenic depending on whether or not they reside within 
an annotated gene or the 5000 bases flanking it (ftp://ftp 
.ncbi.nih.gov/gene/DATA/gene2accession.gz). 
Furthermore, we classified genie hits into coding exons 
versus non-coding introns and flanking sequence (ftp:// 
f tp . ncbi . nih . go v/genomes/H_sapiens /mapview / seq_gene . 



md.gz) to identify hits that may be the result of true 
protein homology (Supplementary Table S2). To 
estimate the number of off-target hits for each HEase in 
the human genome, we counted the number of BLAST 
hits whose is-value is three times worse (larger) than the 
lvalue of the top hit. 

RESULTS 

Native HEases can efficiently cleave targets with 
concomitant silent mutations at all wobble positions 

It has previously been shown that HEases encoded by 
genes residing in introns/inteins of protein-coding genes 
can cleave variants of their target sites bearing single or 
a few synonymous substitutions (22,23). Here, we demon- 
strate that at least some HEases can efficiently cleave 
targets with concomitant silent mutations in all codons 
of the target sequence. This is a result of the evolutionary 
history of the HEases and their hosts: mutations in the 
target site that prevent insertion of the intron/intein 
would be advantageous for the host, however, as most 
molecular parasites occupy critical positions within 
essential genes, most mutations of this sort are lethal 
and only silent mutations are tolerated. HEases thus 
adapted by being able to cut variants of the target site 
carrying synonymous substitutions. The far-reaching 
implication of this finding is that one can use translated 
BLAST and find many formerly unidentified HEase 
targets in the human and other genomes (see below). 
To demonstrate this general principle, we examined the 
target specificity of HEases stemming from two different 
domains of life: Pl-Scel, an HEase from the yeast 
S. cerevisiae and PI-PspI from the archaeon Pyrococcus 
species GB-D. We assessed the cleavage efficiency of these 
HEases on their native targets as well as on targets where 
all wobble positions underwent transitions. Both enzymes 
were found to cleave a target bearing 10 (PI-PspI) or 11 
(Pl-Scel) silent mutations even better than they do their 
original target, while a single non-silent substitution com- 
pletely eliminated cleavage by Pl-Scel and reduced 
cleavage by PI-PspI (Figure 2a and b). Tolerance of 
wobble substitutions does not imply promiscuity. 
Cleavage by Pl-Scel is almost completely prevented by 
non-synonymous substitutions in any one of nine different 
positions along its target (21). Thus, HEase target recog- 
nition combines tolerance for mutations in synonymous 
positions with increased specificity for the non- 
synonymous positions, [albeit restricted; e.g. tolerance to 
some non-synonymous substitutions (21,23), Figure 2b]. 

Our results imply that HEases residing in protein- 
coding genes can cleave many DNA sequences having 
the same translation as their native target. This increases 
the odds of finding an HEase that can cleave any 
disease-associated gene or pre-specified genomic safe 
harbor by several orders of magnitude. For example, 
a 24-bp long DNA sequence is found at random every 
4 24 « 3*10 14 bp while its eight amino acid long translation 
is found at random every 20 8 « 2.5*10 10 codons. It is 
therefore expected that within a large enough array of 
HEases at least some will be found able to cleave the 
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(a) Pl-Scel 

IYVGCGERGNE 
ATTTACGTTGGATGTGGGGAGAGGGGCAACGAG 

T t T t T t t T t T t 

ATCTATGTCGGGTGCGGAGAAAGAGGTAATGAA 
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Figure 2. Characterization of HEase plasticity in target recognition and its use for finding HEase targets in the human and other genomes, (a) and 
(b): HEase cleavage is tolerant of concomitant mutations at all wobble positions along the target site. The cleavage efficiency of the HEases Pl-Scel 
(from S. cerevisiae) (a) and Pl-Pspl (from Pyrococcus species GB-D) (b) was assayed on different targets cloned on a bacterial plasmid. The cloned 
vectors were then fragmented by a restriction enzyme for the sake of visual clarity. Both HEases cleave targets mutated at all wobble positions (top 
DNA sequence) with higher efficiency than they cleave their native targets (bottom DNA sequence). Conversely, a single non-synonymous mutation 
(red arrow) is sometimes sufficient to abolish cleavage by Pl-Scel and to reduce cleavage by Pl-Pspl. (c) and (d): Pl-Scel can cleave its predicted 
targets from the human ATP6V1A1 gene and its homologs in the genomes of animal models, (c) Alignment of the native target of the Pl-Scel HEase 
from S. cerevisiae with the predicted targets in the human ATP6V1A1 gene and its homologs in the genomes of animal models, (d) Results of an 
in vitro cleavage assay demonstrating that Pl-Scel can cleave its predicted targets from the genomes of diverse organisms. UC, uncut; RE, cut by a 
restriction enzyme (Xbal); HEase, cut by an HEase, RE + HEase, cut by Xbal AND by Pl-Scel. S. cerevisiae, Saccharomyces cerevisiae; H. sapiens, 
Homo sapiens; C. familiaris, Canis familiaris; C. jacchus, Callithrix jacchus; M. musculus, Mus musculus; G. gallus, Gallus Gallus; R. norvegicus, Rattus 
norvegicus; D. rerio, Danio rerio. 
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human genome at desired loci. Furthermore, we recall that 
HEGs reside within introns or inteins found in conserved 
motifs. Therefore, in many instances, finding an HEase 
target in the human genome is not a random event but 
rather a result of evolutionary motif conservation from 
the HEase hosting microbe to humans. For example, the 
Pl-Scel HEase originates from an intein lying in the 
vacuolar membrane ATPase (VMA) gene of the budding 
yeast S. cerevisiae. The S. cerevisiae VMA is a homolog of 
the human vacuolar ATPase ATP6V1A1 and has a high 
degree of sequence similarity in the regions flanking the 
Pl-Scel-intein insertion site (Figure 2c; note that the intein 
is present only in the yeast — there are no inteins in 
humans). Indeed, Pl-Scel can specifically cleave a 
PCR-amplified segment of the human ATP6V1A1 locus 
including the predicted target (Figure 2d). We note that 
vacuolar ATPases are recognized as a potential therapeut- 
ic target for the treatment of osteoporosis (39). In particu- 
lar, the vacuolar ATPase inhibitor, FR1 67356, has been 
shown to prevent bone resorption in ovariectomized rats 



(40). It is, therefore, implied that Pl-Scel may be used for 
ATP6V1A1 disruption in desired tissues. Moreover, 
because the target recognition of native HEases is based 
on evolutionary conservation, we predicted that Pl-Scel 
would be able to cleave not only the human ATP6V1A1 
but also the ATPase gene homologs in animal models 
(Figure 2c), thus facilitating pre-clinical efficacy and 
safety assessments. We cloned the predicted target from 
different animals in a bacterial plasmid and subjected the 
vector to Pl-Scel cleavage. We observed efficient cleavage 
of all ATP6V1A1 homologs by Pl-Scel (Figure 2d). 

Construction of HomeBase, the HEase database 

Acknowledging the potential applicability of native 
HEases, we set out to develop a high-throughput 
approach for the detection of HEGs in sequence databases 
and the prediction of HEase targets in the human and 
other genomes. Our algorithm for the construction 
of the 'HomeBase' HEase database is depicted in 
Figure 3a. As a crude first step, we used manually 
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Figure 3. The HomeBase algorithm and database, (a) Database construction involves: (i) the identification of putative HEGs in both genomic and 
meta-genomic databanks using translated BLAST searches; (ii) The identification of vacant homologs using a second translated BLAST; (hi) The 
identification of exon boundaries and the prediction of the target based on the vacant homologs and the translation of the target to define the target 
range. The HomeBase database consists of a set of HEG sequences and the predicted target range of each HEase. The user may then query 
HomeBase with (e.g.) a human gene of interest and receive a list of HEases, which are predicted to cleave the query sequence based on a translated 
BLAST search against the targets in the database, (b) Statistics of the HomeBase database, (c) A pie diagram depicting the percentage of previously 
annotated versus newly discovered HEGs within the results of the HomeBase algorithm, (d) Classification of HomeBase HEGs into structural 
families. 
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curated collections of known HEGs (the above-mentioned 
INBASE and GISSD databases) as queries in a translated 
BLAST to find putative HEGs in the genomic and 
metagenomic DNA databanks (metagenomic DNA 
databanks hold DNA sequences of uncultured organisms, 
mostly from environmental samples). This BLAST search 
is conducted with a very permissive similarity threshold 
(JE-value < 10). 

We next used several filters to separate true HEGs from 
false hits. First, we screened for a feature that is unique to 
HEGs — the presence of a vacant homolog — defined as a 
homolog of the HEG-hosting gene that lacks the intron or 
intein in the respective locus (Figure 1). HEG-hosting 
genes are expected to have vacant homologs because the 
phylogenetic distribution of mobile introns and inteins is 
typically non-monophyletic (41,42); if a certain gene in a 
certain species codes a mobile intron, it is highly probable 
that there exists a related species in which the homologous 
gene is intronless. Conversely, any false hit of the initial 
BLAST search that is not encoded in an intron or an 
intein will be excluded for lack of a vacant homolog. 
HEG-less introns and inteins may have vacant 
homologs, but these were excluded for being shorter 
than a pre-determined length threshold (see 'Materials 
and Methods' section). These filters allowed us to use a 
very low homology threshold in search for novel genes, 
while keeping the frequency of false hits relatively low 
(Figure 3b and c). Thereby many highly divergent novel 
HEGs can be discovered. HEGs residing in non-coding 
RNA genes were excluded from HomeBase because their 
target plasticity is not based on synonymous/ 
non-synonymous positions. 

Using this method we have been able to find 684 HEGs 
(Figure 3b and c), 70% of which have not been annotated 
before as HEGs (60% in sequenced genomes and 10% in 
metagenomic sequences). In a manual inspection of a 
random sample of 30 predictions we found 75 ± 7% 
(mean ± SE) to be true HEGs with correct identification 
of the target site (see 'Materials and Methods' section). 
We then complemented our original data with those of 
others (37,38,43) to conclude with a final database of ap- 
proximately a thousand distinct HEGs. To facilitate the 
applicability of our database, we developed a web-server, 
which allows one to search a DNA sequence of choice 
(e.g. a human gene) for putative targets of HEases. The 
web-server can be freely accessed at: http://homebase- 
search.tau.ac.il/. 

The significant addition of novel HEGs to the set of 
known HEGs can be used to expand the query set of the 
HomeBase algorithm. We conducted a second iteration 
of the entire pipeline using the union of the original 
query set with the 684 HEase-coding sequences that 
were identified in the first iteration of HomeBase. This 
procedure extended the predictions by 222 additional 
HEGs (Figure 3b) demonstrating that the potential of 
novel HEGs in the extant sequence databases has not 
yet been exhausted. We nevertheless hold the results of 
the second iteration as a distinct set and not united with 
our initial 684 results, because, as can be expected, we 
observed significantly lower prediction accuracy after the 
iterative process (See 'Materials and Methods' section). 



We note that the NCBI database holds 1386 sequences 
annotated as HEGs, only 213 of which (15%) were 
retrieved by our algorithm. This is expected because of 
our stringent demand for the presence of a vacant 
homolog in the databases. Importantly, these vacant 
homologs facilitate the identification of the target site of 
each HEase. As the intron/intein insertion site marks the 
target sequence of the resident HEase, we assigned each 
HEase with a predicted target site composed of the DNA 
sequences flanking the inferred splice sites from the exon/ 
extein side (Figure 1). The exact boundaries of each target 
could not be automatically predicted. HEase targets vary 
in length and many are asymmetrically positioned with 
respect to the intron/intein insertion site. As a partial 
remedy, each HomeBase record holds seven codons 5' 
and seven codons 3' to the splice site, which is enough 
to encompass the target of any characterized HEase. 
Any predicted HEase target (e.g. in the human genome) 
would align to a significant and central subsequence of 
these 14 codons, hopefully encompassing the true target 
of the HEase (see 'Materials and Methods' section). 
Importantly, HomeBase assigns each HEase not with a 
target sequence but rather with a target range. As dis- 
cussed above, an HEase encoded within a protein-coding 
gene is expected to cleave many DNA sequences that have 
the same translation as its native target (Figure 2). We, 
therefore, use the amino acid translation of each target to 
define the HEase target range, increasing the odds of 
finding HEase targets in the human and other genomes 
by several orders of magnitude. Our algorithm has 
retrieved 416 unique target ranges for the 684 HEases 
identified (Figure 3b). 

After the first translated BLAST used to identify 
putative HEGs and the second translated BLAST used 
to find vacant homologs, we applied a third translated 
BLAST to find HEase targets of therapeutic or biotech- 
nological uses. At this stage, the amino acid target ranges 
are aligned to the targeted genome, human or other. First, 
we searched for hits in and around human genes 
associated with hereditary and other diseases, as listed in 
NCBI's OMIM database (http://www.ncbi.nlm.nih.gov/ 
omim). Importantly, the match in the human gene needs 
not be in the translated reading frame of the target gene or 
even on the sense strand. Moreover, a match in an intron 
or in the 5'-UTR can be equally useful (for inserting a 
cDNA under the endogenous promoter but upstream to 
the deleterious mutations). 

Table 1 presents selected results of potential medical 
use. The selection exemplifies the special emphasis that 
we have given to those targets that were found in the 
human genome as a result of genuine homology and con- 
servation of the targeted protein motif from the native 
microbe to humans: these targets are also found in the 
homologous loci of animal models and are thus useful 
for pre-clinical experiments. 

For each human target the table indicates the thera- 
peutic relevance, as well as the natural host species of 
the HEase, the level of identity between the predicted 
(translated) HEase target and the (translated) human 
sequence, the classification into HEase families, and an 
estimate of the number of off-target cleavage sites in the 
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Table 1. Selected results of medical significance 



Source of HEase a 


HEase family 


Human gene 


Xherapeutic relevance 


Xarget site identity' 3 


Off-target hits 0 


Botrytis cinerea 


LAGLIDADG 


PRPF8 


Retinitis pigmentosa 


11/11 


0 


Haloferax volcanii 


LAGLIDADG 


POLD1 


Colon and colorectal cancer 


9/10 


0 


Metagenomic 


LAGLIDADG 


FANCA 


Fanconi anemia 


8/9 


1 


Metagenomic 


GIY-YIG 


LMNA 


Dilated cardiomyopathy 


8/9 


8 


Trichodesmium erythraeum IMS 101 


LAGLIDADG 


SMAR-CAL1 


Schimke immunoosseous dysplasia 


11+1/12 


14 


Metagenomic 


LAGLIDADG 


VCP 


Inclusion body myopathy 


10+1/12 


1 


Metagenomic 


LAGLIDADG 


CYP11B2 


Low renin hypertension 


8/8 


3 








Hypoaldosteronism 






Metagenomic 


LAGLIDADG 


SPG7 


Spastic paraplegia 7 


9+1/11 


3 



a Often, a family of related HEases can each cleave the same given target. Here, we list representatives of such families. Our online database and 
server http://homebase-search.tau.ac.il/, hold all members of such an HEase family, when relevant. 

b X+Y/Z: X identities and Y similarities out of a Z amino acid long alignment between the translation of the indigenous target and the translation of 
a putative target sequence in the human gene. An eight amino acids long sequence is predicted to be unique within a random sample of the size of the 
six-frame translation of the human genome. The targets of many HEases are not longer than 8x3 = 24 bp. 

c Off-target hits: number of BLAST hits in the human genome whose lvalue was less than 3 times worse (larger) than the lvalue of the desired hit. 



human genome (see 'Materials and Methods' section). A 
list of the top 66 HomeBase hits in human genes is given in 
Supplementary Table S2. These hits are mostly from 
LAGLIDADG HEases (79%) and some GIY-YIG 
(15%), HNH (3%) and His-Cys (3%). The number of 
off-target cleavage sites for these HEases ranges from 
0 to 112, with a median of 9. 

We note that some HEases have targets only in 
inter-genic human loci (Supplementary Figure SI). Some 
of these loci may prove to be genomic safe harbors, 
allowing for a stable, safe and efficient transgene expres- 
sion (e.g. for the treatment of recessive diseases). Finally, 
the targets of many HEases in HomeBase diverge slightly 
from a desired sequence (e.g. a disease-associated human 
locus). These enzymes may be chosen as scaffolds for 
enzyme engineering to achieve the necessary specificity 
and efficacy. 

Validation of HomeBase predictions 

When a putative HEase is found in an organism with 
highly developed genetic tools, the activity of the 
enzyme can be assayed in the native organism (44). For 
example, HomeBase predicts that the PolB intein of the 
halophilic archaeon H. volcanii encodes an active HEase. 
Our prediction of the intein borders coincides with the 
INBASE predictions (38). We have previously identified 
its endonuclease domain and experimentally validated its 
endonuclease activity. Using an archaeal plasmid assay, 
we have also demonstrated that the H. volcanii HEase 
can cleave its predicted target with high efficiency. The 
assay involves transforming the archaeon with a plasmid 
bearing the HEase target as an integral part of a PolB 
vacant allele. When the plasmid enters the archaeal cell, 
the endogenously expressed HEase can recognize and 
cleave its target and induce intein homing into the 
plasmid via homologous recombination (Figure 4a). 
Intein homing can be detected by PCR amplification 
using target flanking, plasmid-specific primers. In the 
context of the present HomeBase gene targeting applica- 
tion, we now substituted the native target of the enzyme 
with its predicted target in the human homolog of PolB, 
the DNA polymerase delta gene (PolDl, Figure 4b). We 



find that high cleavage efficiency is retained (Figure 4c and 
d). We note that PolDl in eukaryotes is involved in DNA 
repair and that mutations in this gene in humans are 
associated with colon cancer and with sporadic colorectal 
carcinomas (45). 

The H. volcanii HEase is a special case in that its activity 
on the predicted human target could easily be assayed in 
the natural host, because methods of genetic manipula- 
tions are highly developed for this archaeal species. It 
should also be noted that H. volcanii is halophilic and, 
therefore, the enzyme is expected to have reduced 
activity in human cells [although extremophilic HEases 
can be readily adapted to mesophilic conditions by 
genetic engineering (46)]. Therefore, we wanted to 
develop a more robust validation method that would 
allow us to verify the activity of HEases on their predicted 
targets in a eukaryotic setting. Following Chames et al. 
(47), we designed an assay in the budding yeast 
S. cerevisiae, wherein the predicted HEase target is 
inserted between the truncated repeats of the genes 
encoding the metabolic enzymes Ura3 and Lys2. The 
HEG is plasmid borne and is expressed from a 
galactose-induced promoter. Upon HEase cleavage, the 
truncated repeats recombine and reconstitute the metabol- 
ic genes, allowing the yeast to grow on medium lacking 
uracil and lysine (Figure 5a). 

The intein encoded in the gene for the splicing factor 
PRP8 of the fungus B. cinerea has recently been shown to 
be an active HEase (35). According to the HomeBase 
paradigm, this enzyme should be able to cleave the 
PRP8 human homolog PRPF8, mutations in which 
cause the progressive blindness disorder retinitis 
pigmentosa (48). The amino acid sequences of the fungal 
and human genes share a high degree of similarity, while 
the nucleotide sequences have diverged (Figure 5b). 
We designed two yeast constructs, one carrying the 
native B. cinerea target between the truncated repeats of 
the metabolic markers and the other carrying the pre- 
dicted human target. Enzyme activity is highest for the 
natural target but is also very high for the human target 
[respectively: M800- and ~50-fold increase in efficacy; 
Figure 5c and Supplementary Figure S2 showing similar 
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Figure 4. The PolB HEase of H. volcanii can cleave a target sequence from the human gene PolDl. (a) A PCR is preformed on H. volcanii individual 
colonies transformed with a plasmid bearing a vacant PolB allele coding either the native target of the PolB HEase or a homologous sequence from 
the human PolDl gene. The PCR reaction (white arrows denote primers) can amplify either a short product in the absence of homing or a long 
product if homing has taken place, (b) Nucleotide and amino acid alignments of the target sequence from the H. volcanii PolB gene and the 
homologous human sequence from the PolDl gene, (c) Representative results of the PCR assay in cells carrying a plasmid with the human target 
sequence. A long PCR product indicates that homing has occurred, (d) The relative homing efficiency of the PolB HEase to the plasmid-borne vacant 
homolog carrying either native (archaeal) or human targets, or no target (n = 30). Error bars represent 95% confidence intervals based on Monte 
Carlo simulations. 



results using the homologous target sequence from the 
mouse PRPF8 gene]. 

Botrytis cinerea is a fungus, as is the budding yeast. 
However, the yeast assay can demonstrate the activity of 
HEases from diverse phyla. We applied this assay 
to cyanobacteria to demonstrate this ability. The 
ribonucleotide reductase (RNR) of Nostoc species 
PCC7120 encodes an intein with characteristic HEase 
motifs. We used the yeast assay to evaluate the Nostoc 
HEase activity on its predicted target. In spite of the 
large phylogenetic distance between S. cerevisiae and 
Nostoc, the cyanobacterial enzyme cleaved its predicted 
target in yeast with extremely high efficiency (Figure 5d 
and e). We note that cyanobacteria biotechnology is an 
exponentially growing field with implications in biofuel 
production (49) and bioremediation (50,51). Many 
Nostoc strains and species of related cyanobacteria 
encode vacant RNR genes, lacking the intein and there- 
fore possess an intact HEase target. While the nucleotide 
sequences diverge between different species, the transla- 
tion is highly conserved (Figure 5d). We found that the 



cleavage efficiency of the HEase from Nostoc species 
PCC7120 of targets from related species of proven bio- 
technological value, such as N. punctiforme (52) and 
Synechoccus (49) is as high or higher than its activity on 
the endogenous target (Figure 5e). The Nostoc RNR 
HEase exemplifies how the HomeBase paradigm could 
and should be extended far beyond the scope of gene 
therapy alone. Finally, we emphasize that both the 
B. cinerea PRP8 HEase and the Nostoc RNR HEase 
showed no significant toxicity to the budding yeast 
(Supplementary Figure S3) while being nearly as potent 
as the I-Scel golden standard (Supplementary Figure S4). 
This result demonstrates the exquisite specificity of these 
enzymes, because even a single un-repaired DSB is strictly 
lethal in S. cerevisiae (53). 

DISCUSSION 

Attempts to engineer HEases for gene targeting have so 
far focused on a small group of enzymes (54), while the 
plethora and diversity of native HEases have been 
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Figure 5. A yeast assay demonstrating the activity of different HEases on their native targets as well as on targets of therapeutic or biotechnological 
uses, (a) The yeast assay for HEase activity [Following Chames et al. (48)]. An HEase target site is inserted between truncated Ura3 repeats and 
between truncated Lys2 repeats. Upon HEase cleavage a recombination event reconstitutes the respective metabolic markers, (b and c) The B. cinerea 
PRP8 HEase can cleave the human PRPF8 gene, (b) Nucleotide alignment of the B. cinerea PRP8 HEase-target and the homologous sequence from 
the human PRPF8 gene. The asterisk indicates that the adenine (green A) in the human sequence is the last nucleotide of an intron. The cDNA of 
human PRPF8 has a thymidine at this position (and is part of a tryptophan codon — W). The target used in our assay has adenine to show cleavage 
of the genomic sequence, (c) Relative activity of the B. cinerea PRP8 HEase on its native target and on its human target from the PRPF8 gene (log 
scale). Relative activity is the ratio between the growth rates of strains with or without the HEase-expressing plasmid. (d and e) The Nostoc RNR 
HEase can cleave its predicted target as well as targets in related cyanobacteria of biotechnological use. (d) Nucleotide alignment of the Nostoc 
species PCC7120 RNR HEase-target and the homologous sequence from the RNR genes of N. punctiforme and Synechococcus. (e) Relative activity 
of the Nostoc species PCC7120 RNR HEase on its native target and on sequences from the RNR genes of N. punctiforme and Synechococcus (log 
scale). Error bars represent 95% confidence intervals based on Monte Carlo simulations. N. punctiforome, Nostoc punctiforome. 
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overlooked. Our HomeBase platform lists for the first time 
approximately a thousand different HEases alongside 
their predicted targets. The computational pipeline de- 
veloped here is a set of tailor-made methods for HEase 
discovery and characterization that rely on their unique 
biological and evolutionary properties: the presence of 
vacant homologs, their setting within introns/inteins, 
and their predictable tolerance for silent mutations in 
their target sequences. 

Enzyme engineering can now begin by choosing a 
scaffold HEase whose target is most similar to the 
sequence at the target locus of choice. We have also 
developed methods for preliminary validation of HEase 
activity in a eukaryotic setting. Candidate screening in 
yeast can help in the detection of degenerated HEases 
that are naturally common (13,55,56). Notably, evolution- 
ary considerations may sometime only help to approxi- 
mate the intricacies of enzyme specificity, as can be 
revealed, for example, by yeast surface display (57) or by 
cleavage assays with large randomized target libraries (23). 
When suboptimal specificity is revealed, the yeast selection 
system can be used for directed evolution of selected 
HEases (58). However, we have shown here that native 
HEases could sometimes themselves be used for the gene 
targeting of disease-associated genes and genes of biotech- 
nological relevance. HEases possess exquisite specificity 
that has evolved through billions of years. The long 
target sequences of HEases allow the recognition of 
unique sites while considerations of sequence conservation 
allow an approximation of the plasticity in target recog- 
nition. In particular, for those HEases that reside in 
protein-coding genes, we have established that a translated 
BLAST could be used to find cleavable targets in the 
human and other genomes. Even when an HEase has 
more than a single target in a genome of choice, the 
off-target effect may be confined and predictable based 
on evolutionary considerations. This predictability is 
however limited as some tolerance of non-synonymous 
substitutions can be expected (21,23). 

We believe that the safety concerns regarding the use of 
site-specific endonucleases in gene therapy are well 
addressed by native HEases. This is all the more true for 
a subgroup of HEases whose predicted human targets are 
found in homologs of the microbial HEase-hosting gene. 
These enzymes are shown to cleave the same locus in 
humans and in any animal model of choice (Figure 2c 
and d), thus facilitating pre-clinical efficacy and safety 
assessments. The Nostoc RNR exemplifies how such 
enzymes could also be used in the biotechnology 
industry. In this study we focused on members of the 
LAGLIADG HEase family, comprising -80% of all 
HEases. Future studies should test the applicability of 
our conclusions to all HEases as may be implied by 
previous reports (28,29). Finally, the genes for native 
HEases are readily available and can be incorporated 
in therapeutic and other vectors with relative ease and 
low costs compared to the engineered alternative. 
We therefore believe that the under-recognized diver- 
sity and plasticity of native HEases should become a 
valuable tool in the fields of gene therapy and genetic 
engineering. 
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